An improved mountain gazelle optimizer based on chaotic map and spiral disturbance for medical feature selection

Feature selection is an important solution for dealing with high-dimensional data in the fields of machine learning and data mining. In this paper, we present an improved mountain gazelle optimizer (IMGO) based on the newly proposed mountain gazelle optimizer (MGO) and design a binary version of IMGO (BIMGO) to solve the feature selection problem for medical data. First, the gazelle population is initialized using iterative chaotic map with infinite collapses (ICMIC) mapping, which increases the diversity of the population. Second, a nonlinear control factor is introduced to balance the exploration and exploitation components of the algorithm. Individuals in the population are perturbed using a spiral perturbation mechanism to enhance the local search capability of the algorithm. Finally, a neighborhood search strategy is used for the optimal individuals to enhance the exploitation and convergence capabilities of the algorithm. The superior ability of the IMGO algorithm to solve continuous problems is demonstrated on 23 benchmark datasets. Then, BIMGO is evaluated on 16 medical datasets of different dimensions and compared with 8 well-known metaheuristic algorithms. The experimental results indicate that BIMGO outperforms the competing algorithms in terms of the fitness value, number of selected features and sensitivity. In addition, the statistical results of the experiments demonstrate the significantly superior ability of BIMGO to select the most effective features in medical datasets.


I. Introduction
With the continuous development of medical informatics, the amount of medical data is growing rapidly.Medical data are a general term for data and information from multiple fields such as consultation services, disease prevention, health checkups, etc.These data mainly include: electronic medical records, medical images, laboratory data, sign indicator data and personal health data.Medical data is a crucial diagnostic basis for doctors, providing more comprehensive treatment clues and assisting doctors in providing more accurate diagnoses.Disease diagnosis is the cornerstone of prevention and treatment and can be used to determine the type, characteristics, and severity of diseases through medical data analysis and early symptom detection, providing the basis for early intervention and treatment.The diagnostic accuracy directly affects the success rate of disease treatment.The improvement of the accuracy rate can not only effectively improve the cure success rate, survival rate, survival cycle and quality of life of patients but also reduce the medical cost of patients.
However, current medical data face problems such as large data volume, multiple data types, high data dimensionality, high value but low value density, and real-time nature [1].These challenges lead to high time costs, low diagnostic accuracy, and a reliance on empirical knowledge in disease diagnosis and research [2], hindering patient recovery, survival rate, and quality of life improvements and healthcare cost reductions [3].
In this context, integrating machine learning techniques for medical diagnosis is emerging as a significant research trend [4,5].Nonetheless, raw medical data contain numerous irrelevant and redundant features [6], which not only obstruct data analysis but also lead to the 'curse of dimensionality' [7].Consequently, extracting essential information effectively from raw data is important for enhancing both the accuracy and efficiency of diagnoses.
Feature selection (FS) has always been a critical research domain in machine learning.Identifying the most relevant and effective feature subsets is the objective of FS [8,9].In the domain of medical data processing, FS is particularly important, aiding in selecting the most representative and useful features and thereby simplifying the model and enhancing the efficiency and accuracy of data processing.Additionally, FS improves the model interpretability, enabling doctors to understand the decision-making process, which in turn increases their trust in the model.Finally, robust models can be established by eliminating irrelevant and redundant features [10].
However, searching within the feature subspace to identify useful feature subsets is an NPhard optimization problem [11][12][13].All search methods can be broadly categorized into three types: complete search algorithms, heuristic search algorithms, and random search algorithms [14].The first type of algorithm is rarely used because it requires considerable computing power and is easily affected by changes in the size of the data.Heuristic search algorithms generally have moderate search capabilities.They are prone to becoming trapped in local optima and cannot effectively handle the problem of combination explosion in feature subset solution spaces [15].Random search algorithms, represented by metaheuristic algorithms, use stochastic methods to obtain feature subsets, allowing for a larger search space and effectively avoiding local optima [16,17].Metaheuristics are primarily split into single-solution-based and population-based algorithms [18].The former works by optimizing a single solution.In contrast, the latter creates a group of solutions, termed a 'population', in each iteration.This approach is more effective at avoiding local optima [19].Population-based metaheuristic methods can be further divided into evolution-based algorithms, human-based algorithms, physics-based algorithms, sports-based algorithms, light-based intelligent algorithms, and swarm intelligence algorithms [20,21].
These algorithms process medical data effectively via distributed computing.Informationsharing mechanisms also significantly improve the model efficiency and adaptability.Additionally, their inherent flexibility and robustness ensure stable performance, even in the presence of individual errors or failures.Importantly, swarm intelligence algorithms excel at preventing premature convergence and achieve superior optimization precision through collaborative decision-making and strategic search methodologies.
Swarm intelligence algorithms have achieved significant success in medical data FS.However, it is difficult for these algorithms to balance exploration and exploitation, maintain population diversity, maintain convergence, and adjust parameters [34,35].Moreover, many studies using swarm intelligence algorithms for medical data feature selection address simple and limited problems, targeting only specific diseases or datasets, with a generalizability that is not better adapted to today's rapidly evolving medical data needs.In addition, some algorithms are inefficient for large-scale data.Therefore, given the characteristics of medical data, such as the variability and instability of the data and feature volume [36], more flexible and adaptive processing methods are needed.
The introduction of a novel algorithm named the mountain gazelle optimizer (MGO) is not affected by parameter settings because it is parameter-free [37][38][39].In addition, the algorithm perfectly balances exploration and exploitation by using four different mechanisms at all optimization stages.Moreover, since the MGO uses many finite vectors, it has an excellent ability to escape from local optima and can explore all optimization spaces.In addition, according to the experimental results of Abdollahzadeh et al., the MGO algorithm has a strong ability to solve continuous problems, with very good results when both the population size and dimensionality change.
However, the MGO still has certain limitations in terms of solution diversity, local search ability, and escaping local optima.Moreover, the current MGO algorithm is mainly used to solve continuous problems, and there is no binary version for solving feature selection problems.This situation prompted us to improve the MGO for these problems and apply it to feature selection tasks.
In this study, we first propose an improved mountain gazelle algorithm named IMGO, which uses an iterative chaotic map with infinite collapses (ICMIC) to initialize the gazelle population and nonlinear factors to control the coefficient vectors, includes a spiral perturbation mechanism to perturb the position of individuals and performs a depth search of the neighborhood of the optimal individuals.To verify the improved performance of this algorithm, it is evaluated on 23 benchmark functions, and its superiority is demonstrated in comparison experiments with the original algorithm and 8 well-known and newly proposed metaheuristic algorithms, namely, the WOA, PSO, GWO, marine predator algorithm (MPA) [40], NOA, Kepler optimization algorithm (KOA) [41], SSA and BA.The binary version of IMGO (BIMGO) is then developed to handle the feature selection task.The proposed BIMGO algorithm is evaluated on 16 medical datasets and compared with the binary versions of the eight metaheuristic algorithms mentioned above.The experimental results show that the proposed BIMGO algorithm outperforms the comparison algorithms in terms of the fitness value, number of selected features, and convergence.In addition, the use of the 5% Wilcoxon ranksum test verifies that the BIMGO algorithm performs significantly better than the competing algorithms on most of the datasets.
The main contributions of this paper are summarized as follows: a.For the problem of low population diversity of the original algorithm, in the population initialization stage, ICMIC chaotic mapping is used instead of the original random initialization to improve the diversity of the population.
b.By introducing a nonlinear control factor, the global search in the early stage of the algorithm and the local search in the late stage of the algorithm are enhanced, and the search efficiency of the algorithm is improved.c.To enhance the local search ability of the algorithm and eliminate local traps, a spiral perturbation mechanism is adopted to perturb the position of individual gazelles during the iteration process.
d.The algorithm searches for the neighborhood of the optimal individual at the later stage of each iteration, effectively enhancing the development capability of the algorithm.
e.A binary version of the IMGO is developed to apply the proposed algorithm to the feature selection task, which is ideal for dealing with feature selection in medical data.
The remainder of this paper is structured as follows.The second section describes related progress on using swarm intelligence algorithms to analyze medical data.The MGO algorithm and the proposed IMGO and BIMGO algorithms are presented in Section III.In the fourth section, the IMGO is evaluated on 23 benchmark functions.Furthermore, the applicability of the BIMGO for extracting effective features from medical data is verified.The last section discusses the conclusions and future research directions of this research.

II. Related works
Swarm intelligence algorithms have shown significant advantages in accomplishing feature selection tasks represented by medical data through mechanisms such as distributed computing, information sharing, and collaborative decision making.This section reviews related works on the use of swarm intelligence algorithms to solve the problem of medical data analysis, which ranges from early research on simple identification of important factors for diseases to more recent research on medical diagnosis aids.
Chen et al. [42] illustrated a combination of PSO and the 1-nearest neighbor (1-NN) mechanism, which can effectively identify important factors in patients with obstructive sleep apnea (OSA).Lin et al. [43] combined the endocrine-based PSO algorithm with the artificial bee colony (ABC) algorithm and used support vector machines (SVMs) to perform sorting on specific medical datasets.Brahim Sahmad et al. [44] integrated the binary firefly algorithm, quickly identifying high-quality solutions and significantly improving the classification accuracy on medical datasets.Anter et al. [45] introduced a hybrid crow search optimization algorithm that combines chaos theory with the fuzzy c-means algorithm.The performance of this hybrid approach on medical datasets demonstrated its effectiveness and stability.Sahlol et al. [46] combined fractional order and the marine predator algorithm (MPA), which achieved high classification accuracy on a COVID-19 dataset.Asghari Varzaneh et al. [47] applied the horse herd optimization algorithm (HOA) to predict the intubation risk of hospitalized patients with COVID-19, efficiently identifying critical predictive factors for better performance.Nadimi Shahraki et al. [48] deployed a binary version of the quantum-based avian navigation optimizer algorithm (BQANA) with a threshold method for high-dimensional medical datasets.The method outperforms nine well-known binary metaheuristic algorithms in optimal feature subset selection.Elgamal et al. [49] suggested an improved reptile search algorithm (IRSA) by incorporating chaos theory and simulated annealing.This method effectively improves the search capability, performing better than the original algorithm and comparison methods on medical datasets.Wang et al. [50] introduced SVM-MPA, a novel combination of the MPA with an SVM.They constructed an effective subject-independent anterior cruciate ligament defect detection model to provide an accurate preoperative auxiliary testing method for the clinical diagnosis of anterior cruciate ligament (ACL) deficiency.Finally, Mohammad H. Nadimi-Shahraki [51] enhanced the WOA with the suggested mechanism and search strategies.They proposed the E-WOA and BE-WOA as binary versions verified on a medical disease dataset.The test results showed that the E-WOA outperformed the latest optimization methods.This algorithm was also successfully applied to diagnose COVID-19, providing a feasible model for diagnostic medical treatment.Braik et al. [52] proposed three methods based on the Capuchin search algorithm, ECSA, PCSA, and SCSA, combined the binary versions of these three algorithms with a k-nearest neighbor (KNN) classifier, and demonstrated their performance on a medical dataset.To address the challenge of rapidly increasing glaucoma infections, Singh et al. [53] proposed a hybrid algorithm based on the emperor penguin optimization algorithm and bacterial foraging optimization to extract effective features from retinal fundus baseline images, minimizing the number of features while improving the classification accuracy.This method can assist overworked ophthalmologists and prevent individuals from losing vision.To address the problems of slow convergence and imbalance between exploration and exploitation with the hunger games search (HGS), Hashim et al. [54] proposed an improved version of the HGS named mHGS, which was able to solve the feature selection problem well for Parkinson's disease phonation datasets.Neggaz et al. [55] proposed an enhanced variant of the manta ray foraging optimizer (MRFO), MRFOSCA, using trigonometry operators inspired by the sine cosine algorithm (SCA), effectively improving the problem of convergence to local minima.This approach was used to solve the feature selection problem represented by medical datasets.
Related studies have shown that various swarm intelligence algorithms have been used to select important features from medical data to assist in data analysis, as shown in Table 1.However, the aforementioned swarm intelligence algorithms have some limitations when dealing with medical datasets.First, some studies only target specific and single medical data, without the universality of analyzing medical data.Second, some studies have focused on lowdimensional datasets and may not have had the ability to analyze current high-and ultrahighdimensional medical data.Again, there are issues of unbalanced search strategies and low population diversity.Finally, some studies only involve medical data and are not specifically focused on medical data, and the evaluation criteria used do not account for the characteristics of medical data.To address these issues, in this study, the BIMGO algorithm was developed for feature selection for medical data, aiming to improve the robustness of the algorithm, the diversity of the solutions, and the balance of the search strategies.This algorithm was applied to the task of feature selection for medical data in multiple dimensions.

A. Mountain gazelle optimizer
The MGO algorithm is a novel swarm intelligence algorithm [39], inspired by the social behavior and group life of mountain gazelles that live around the Arabian Peninsula.Mountain gazelles will form three groups in their life: female herds, single young male herds, and territorial male herds [56].Every gazelle can become a member of the three groups during the optimization operation.Because young male gazelles are not mature, strong enough to mate or control the female herd, selecting a search population comprising one-third of the population in the MGO incur minimal costs.
The best global solution for the MGO in herd territories is adult male gazelles.The MGO uses four mechanisms for mathematical modeling, described as follows.
1) Solitary territorial males.When male gazelles age and are sufficiently muscular, they choose areas far from other territories to establish and protect their own territory.Young male gazelles engage in fights when they attempt to challenge territorial males for territory or female gazelles.Adult territorial males are influenced by young males and the current optimal individual.The territory of an adult male is modeled as.(1): where male gazelle represents the location vector of the best individual.ri 1 and ri 2 are random numbers, with values of either 1 or 2. BH is the vector representing the effect factor of the minorities affected by search agents and nonsearch agents.BH is determined by Eq (2).The value of F, denoting the weights affected by the iteration, is calculated by Eq (3).
In Eq (2), ra represents the interval of the presence of young individuals, and X ra is the random solution within this interval, indicating young males.In the optimization process, M pr is the average of d N 3 e randomly selected search agents.r 1 and r 2 are random values ranging from 0 to 1. N represents the population size.In Eq (3), N 1 represents a random number from a standard distribution in the problem dimensions, exp is an exponential function, Iter denotes the iteration at present, and MaxIter indicates the maximum number of iterations.
To enhance the search capabilities, a coefficient vector Cof i is proposed: a � N 2 ðDÞ; r 4 ðDÞ; where r 3 and r 4 are random parameters, with values ranging from 0 to 1. N 2 , N 3 , and N 4 are random numbers distributed within the dimensions of the problem, and a is a control parameter determined by Eq (5): Eq (5) shows that a depends on the iteration process, and its value range is [−2,−1).
2) Maternity herds.The continuation of the life cycle of the mountain gazelle group is inseparable from male gazelles reproducing with individuals in the female herd.Male gazelles play an important role when young males attempt to mate with females or when females give birth.Female herds are influenced by the best individual of the current iteration, randomized search agents, and young males; this behavior is expressed using Eq (6): where BH is calculated using Eq (2), Cof 1,i and Cof 2,i are random vectors calculated through Eq (4).ri 3 and ri 4 are random coefficients, each with a value of either 1 or 2. X rand represents the vector position of a random individual.
3) Bachelor male herds.When young male gazelles grow into adults, they establish their own territories and attempt to control female gazelles.During this process, young male individuals engage in violent fights with older males.This behavior can be expressed using Eq (7): where X(t) is the location of the individual in the current iteration.ri 5 and ri 6 are random coefficients with values of either 1 or 2. r 6 in Eq (8) is a random number ranging from 0 to 1. D denotes the vector of coefficients influenced by the current and optimal individuals, and its value is calculated by Eq (8): 4) Migration to search for food.The maintenance of the life cycle of gazelles is inseparable from their food consumption.Gazelles continuously search for food and migrate, leveraging their impressive running and jumping abilities.This process can be expressed as Eq (9): where ub and lb represent the upper and lower limits of the problem, respectively, and r 7 is a random value between 0 and 1.
The above four mechanisms are utilized by all individuals to produce a new generation, which is then added to the population.This process of generating a new generation is akin to a duplicate.With the completion of each generation, the whole groups are arranged in ascending order.In general, the best individuals are protected in the group, while the poor individuals are removed.

B. Improved mountain gazelle optimizer
To achieve high accuracy as well as fast convergence speed, four different mechanisms are adopted in the MGO.However, there are still some shortcomings, such as the imbalance between exploration and exploitation and the ability to easily fall into local optimal solutions.In an effort to improve the capability of the MGO and achieve better feature selection in medical data analysis, this paper presents an improved mountain gazelle optimizer.The IMGO method includes four innovations.First, ICMIC mapping is utilized as an initialization method, ensuring the multiplicity of the population and improving the early exploration capabilities of the algorithm.Second, the control factor a in the coefficient vector is replaced with a nonlinear factor that balances the global and local search capacities and improves the search efficiency.Third, a spiral perturbation mechanism is utilized to improve the local search capability.Eventually, the search is exploited in the optimal individual's neighborhood to further enhance exploitation.
1) Initialization of ICMIC mapping.In the MGO algorithm, the population is initialized randomly.This initialization strategy limits the algorithm's performance because it does not ensure the diversity of solutions and may easily cause the algorithm to fall into local optima.
We use chaotic mapping as the initialization method to address this problem.The initial population obtained by this method covers the entire solution space, enhancing the global search capability.Additionally, the population created through chaotic mapping is uniformly distributed, aiding in minimizing the likelihood of becoming trapped in local optima.This approach contributes to the improvement of the convergence speed [57,58].Logistic mapping and tent mapping are the most common chaotic maps.However, these mapping approaches have a limited number of folds in their iterative regions.In contrast, the ICMIC [59] is an infinitely folded iterative chaotic map.Moreover, the ICMIC has high Lyapunov exponents; therefore, it has stronger chaotic properties than do other chaotic mappings.In addition, the ICMIC has the advantages of initial value sensitivity and uniform distribution.Therefore, in this paper, the ICMIC is used to initialize the population.The ergodicity of the ICMIC overcomes the shortcomings of traditional optimization algorithms by enabling better diversity in the initial state of the population, preventing premature convergence and improving the accuracy and convergence of global optimization.
Eq (10) provides the mathematical expression for the ICMIC: where 1] is the i-th gazelle individual and α2(0,1) is the control parameter.A good chaotic sequence can only be obtained when α>0.6; hence, we set α = 0.7 and Eq (10) is then replaced by Eq (11): Algorithm 1 shows the details of population initialization.
2 � ðub À lbÞ; % Updating the position after chaotic mapping end for 2) Nonlinear control factor.Cof r is a coefficient vector randomly selected in each iteration, participating in the operation of three out of four mechanisms, namely TSM, MH, and BMH, and its value is largely determined by the control factor a in Eq (5).
According to Eq (5), Fig 1(A) shows that the control factor linearly increases from -2 to -1.If the algorithm becomes stuck at a "nonideal point" in the initial phase, the constant change rate makes it prone to premature convergence to local optima.Therefore, we introduce a nonlinear control factor to improve the global search during the early stages, aiming for a comprehensive search for the solution domain.Additionally, the focus shifts to strengthening the local search in the later stages, seeking better possibilities within the known range.In this case, the control factor a is adjusted from Eq (5) to Eq (12).
where t represents the current iteration, and MaxIter is the maximum number of iterations.
As shown in Fig 1(B), the preslope is large, and the nonlinear control factor changes quickly, which is more favorable than the linear control factor for increasing the search range and enhancing the global search ability at the initial iteration.When approaching MaxIter, the linear factor still maintains a uniform change.Thus, the search range cannot be effectively controlled, and the convergence ability of the algorithm is reduced, while the nonlinear control factor changes slowly, effectively controlling the search range, enhancing the local search ability, accelerating the convergence speed, and improving the quality of finding feasible solutions.
3) Spiral perturbation mechanism.New individuals are continuously added during the iteration process through four mechanisms.However, there is no position update for the existing gazelles in the original algorithm, leading to insufficient local search capabilities and preventing the algorithm from converging quickly.Therefore, to increase the local search capacity, a spiral perturbation mechanism for gazelle individuals is introduced.The spiral search strategy is proposed in the WOA for modeling the behavior of whale populations in terms of rounding prey.During the iteration process, individual whales use this strategy to update their positions, thus increasing the diversity of individuals while ensuring the convergence speed of the algorithm.Inspired by the spiral search in the whale optimization algorithm, the current individuals are perturbed after the gazelle population is updated; then, the fitness values of the new individuals obtained after the perturbation are compared with those before the perturbation, and the better individuals are retained in this paper.The perturbation process is influenced by the global optimal individual of the current iteration, and the perturbation method is a spiral search, which is shown in Eq (13): where male gazelle is the optimal individual, c is the spiral shape constant, and l is the path coefficient, which is a random number within the range of [-1, 1].By introducing the spiral perturbation mechanism and selecting the elite individuals among the individuals before and after the perturbation, the localized search capability of the current individual and the convergence speed can be effectively enhanced.This approach also augments the diversity of individuals and optimizes the overall search efficiency.
4) Optimal individual neighborhood search.The gazelle algorithm utilizes four unique mechanisms named TSM, MH, BMH, and MSF for optimization, primarily focusing on strengthening its global search capabilities.However, its limited local search ability makes convergence challenging.To achieve a better equilibrium between exploration and exploitation throughout all optimization stages and given the high probability of the existence of globally optimal solutions within the optimal individuals and their neighborhoods, we introduced an optimal individual neighborhood search strategy.
This method records the optimal gazelle male t gazelle and the suboptimal gazelle male t gazelleÀ 1 in each iteration, defining an area between male t gazelle and male t gazelleÀ 1 as the optimal individual's neighborhood.A local search is then conducted within this neighborhood, and the random individual BN t within the neighborhood is mathematically represented by: where random represents a randomly selected gazelle within the neighborhood, ub bn and lb bn are the upper and lower bounds, respectively, and t is the current iteration number.Finally, the fitness value of the individual obtained from Eq ( 14) is compared with that of the current best value to update the optimal individual male gazelle .Through the introduction of the optimal individual neighborhood search strategy, the local search of the neighborhood of the optimal and suboptimal individuals is strengthened, effectively improving the development and convergence abilities of the algorithm.
5) Algorithm framework.The process of the proposed IMGO is shown in Fig 2 .Initially, the IMGO employs the ICMIC to initialize the gazelle population.In each iteration, nonlinear control factors are calculated.Subsequently, the IMGO updates the population in parallel using four distinct mechanisms.Then, a spiral perturbation is applied to adjust the positions of the gazelle individuals.Finally, the algorithm conducts a search in the optimal individual's neighborhood to update the optimal individual.By improving the local exploitation capabilities and balancing global and local searches, the algorithm effectively enhances the convergence speed and overall performance.The pseudocode for the IMGO is detailed in Algorithm 2.

C. Binary improved mountain gazelle optimizer
The IMGO algorithm is suitable for continuous search spaces, but for discrete search spaces, the algorithm cannot be applied directly.To this end, we developed a binary version of the IMGO, named the BIMGO.In an effort to address discrete optimization issues, a sigmoid function that maps continuous space into discrete space is utilized.Eq (15) was utilized to develop the binary enhanced mountain gazelle algorithm: where S(X i ) represents the probability of changing its binary position.Then, a threshold needs to be set.To ensure a uniform distribution of 0 to 1 in the discrete space, we set the threshold to 0.5.The update of position B i is given by Eq (16): ( Algorithm 3 represents the optimized process of the binary conversion process by using pseudocode.

A. Experiment on benchmark functions
To validate the effectiveness and performance of the proposed IMGO algorithm, we compared the IMGO algorithm and the original MGO algorithm together with eight well-known optimization algorithms on 23 benchmark functions.The eight algorithms are the WOA, PSO, GWO, SSA, MPA, NOA, KOA, and BA.
1) Parameter settings and benchmark functions.Table 2 shows the settings of the relevant parameters used in the experiment.The parameters of all comparative algorithms are set on the basis of their original papers.
All experiments were performed under the same conditions.The computer used for these experiments was equipped with an Intel Core i7, 2.8 GHz CPU, and 16 GB of RAM.All experiments were implemented in MATLAB 2023a and run on the same computer with the Windows 10 operating system.
The representation of the IMGO and comparative algorithms is evaluated using 23 fundamental benchmark functions given in Table 3.The functions are primarily separated into 3 categories: unimodal benchmark functions (f1−f7), multimodal benchmark functions (f8−f13), and fixed-dimensional multimodal benchmark functions (f14−f23).
12,5.12] Dim 0   When evaluating the algorithm on benchmark functions, the population size was set to 30, the number of independent runs was set to 20, and the maximum iterative number was set to 500.
2) Experimental results and discussion.The optimal values, mean values, and standard deviations are adopted for comparative analysis.The optimal algorithm is determined based on the mean value, which is highlighted in bold.In cases where algorithms share identical mean values for a function, the standard deviation, which is regarded as superior when it is smaller, is then considered a deciding factor.Moreover, a 5% Wilcoxon rank-sum test is used to determine whether significant differences exist in the performance between the IMGO algorithm and competing algorithms on various functions.
According to the principles of the 'No Free Lunch (NFL)' theorem [51,60], no algorithm can achieve optimal results across all test functions.In addition to evaluating the performance of the IMGO algorithm against the other eight algorithms, it is also important to compare the IMGO algorithm with the original MGO algorithm because the IMGO algorithm is an improved version of the MGO algorithm.Therefore, in cases where the IMGO exhibits better performance than that of the MGO but does not rank as the best overall, the results are underlined.
The results of the experiments between the IMGO and its comparative algorithms on basic unimodal benchmark functions are presented in Table 4.
For f1, f3, f4, f5, and f7, the IMGO exhibited superior performance with respect to the mean values and standard deviations.For function f2, the IMGO was second-best algorithm, ranking behind the WOA, its performance surpassed that of the original MGO.In the evaluation of function f6, the IMGO was fourth-best algorithm, and ranking behind the MGO among all the compared algorithms.The average ranking of IMGO's performance on function f1−f7 is 1.6, which is the best among all algorithms.
Fig 3 shows the convergence curves of each method on different functions.It is observed that the IMGO algorithm converged best on functions f1, f3, f4, f5, and f7.In particular, the convergence ability of the IMGO was the most pronounced on functions f1, f4, and f5, where it rapidly converged to the optimal solution.The proposed IMGO algorithm exhibited the weakest performance on function f6; however, it ranked fourth in terms of overall convergence.Although the proposed algorithm ranked second on function f2, its overall convergence efficiency surpassed that of the original MGO.The IMGO algorithm exhibited excellent search capability and converged to the extremes very quickly throughout the iterations.
Based on the above analysis, the IMGO performed best on functions f1-f7, reflecting the ability of this algorithm to handle simple challenges.
In Table 5, the results of the tests on multimodal benchmark functions are presented.
The IMGO algorithm attained the best mean among all algorithms on functions f8, f9, f10, f11, and f13.For functions f9 and f11, the MGO and IMGO achieved the best results in terms of the optimal solution, mean, and standard deviation.On function f12, the capacity of IMGO was only slightly worse than that of the MGO.The average ranking of IMGO's performance on function f8−f13 is 1.2, which is the best among all algorithms.
Fig 4 shows the convergence curves, highlighting that the IMGO exhibits the best convergence performance on functions f8, f9, f10, f11, and f13.Notably, this algorithm converges to the optimum in fewer than 100 iterations on both f9 and f11.The IMGO algorithm also shows strong convergence on f13, converging to the optimal solution at 300 iterations.Although the performance of the MGO surpassed that of the IMGO on function f12, the IMGO significantly outperformed the other algorithms.By synthesizing the data from Table 5 and Fig 4, the conclusion can be drawn that the IMGO exhibits the best performance on functions f8−f13.
Table 6 presents the performance of the IMGO algorithm in comparison to that of other methods on fixed-dimensional multimodal benchmark functions.The IMGO algorithm achieved the best means and standard deviation on function f18, f19, f21, f22, and f23.Although IMGO also has the best mean on function f14, f16, and f17, its standard deviation is not as good as that of MPA on function f14, and not as good as that of MPA and MGO on function f16 and f17.
IMGO performs only slightly worse than MPA on function f15, but better than MGO.IMGO performs average on function f20, only ranking fourth, but better than MGO.In the table, we used the Scientific notation, which led to the fact that the values were not fully displayed.Therefore, although some values were displayed the same, the actual values were different, which led to different rankings.The average ranking of IMGO's performance on f14−f23 is 1.4, which is the best among all algorithms.
The convergence curves for functions f14−f23 are shown in Fig 5 .The IMGO algorithm generally exhibited faster convergence and better optimal solution findings across most benchmark functions.This algorithm converged more quickly to the optimal solutions on functions f14, f16, f17, f18, f19, f21, f22, f23 than on the other functions.Overall, the IMGO demonstrated the best performance on functions f14−f23.The experimental results on the multimodal benchmark functions (f8−f23) prove that the IMGO achieved a balance between exploration and development and avoided becoming trapped in local optimal solutions.The IMGO method yielded better results in handling both unimodal and multimodal problems because there are many improvements with respect to its ability to balance exploration and exploitation.
The 5% Wilcoxon rank-sum test statistics for IMGO and the other compared algorithms on each benchmark function are presented in Table 7. Table 7 presents these results, employing the symbols (+, =, and −) to denote whether IMGO's performance on each benchmark function is 'significantly better than, comparable to, or worse than' that of the other algorithms under comparison.
Table 7 indicates that the IMGO algorithm outperformed the other algorithms in most cases: it outperformed the BA on 22 out of 23 functions, the GWO algorithm on 21 out of 23 functions, the PSO algorithm on 16 functions, the SSA on 20 functions, the WOA on 19 functions, the KOA algorithm on all 23 functions, the NOA algorithm on 21 functions, the MPA on 15 functions, and the MGO algorithm on 10 functions.However, this algorithm was less effective than the BA on 1 function, the PSO on 1 function, the SSA on 2 functions, NOA on 1 function, and the MPA on 3 functions.Compared to the performance of the MGO, the IMGO underperformed on 7 of 23 functions.The statistical results indicated that the median performance of IMGO was the best among all algorithms.In conclusion, the comparative analysis of the IMGO against competing algorithms across 23 benchmark functions revealed that the IMGO not only outperformed the original MGO algorithm but also outperformed the other comparative methods.The average ranking of IMGO is 1.4, which is the best ranking among all algorithms.The IMGO algorithm outperformed the original MGO algorithm on 16 of 23 functions, while it inferior to the MGO algorithm on only 2 functions.This finding underscores the fact that our improvements are both valid and efficacious, substantially boosting its optimization and convergence capacities.The experiment shows that IMGO has excellent optimization ability in dealing with continuous space problems.
The parameters are set based on their original papers, as shown in Table 2. To ensure fairness in the evaluation, all algorithms use the same common parameters on medical datasets.These parameters include a population size of 30 individuals, a maximum iterative number of 100 times, and 20 independent runs for each algorithm.A support vector machine (SVM) classifier [61] is utilized to classify the feature subsets generated by all algorithms to evaluate the performance of all algorithms.
2) Evaluation metrics.The evaluation criteria are set as the fitness, sensitivity, and the number of selected features.Fitness is measured by the error rate in classifying medical datasets, calculated using Eq (17).The sensitivity represents how many true positives are correctly classified.For medical datasets, the sensitivity can reflect the algorithm's ability to correctly detect disease features.The sensitivity is an important indicator and is calculated using Eq (18).
where, TP, TN, FP, and FN denote true positives, true negatives, false positives and false negatives, respectively.
3) Experimental results.The performance of the BIMGO was assessed on the target medical datasets and compared with that of the eight well-known methods.show the results of the experiment and the ranks of all the algorithms based on the mean values.The best values are highlighted in bold.
A comparison of the performance of the BIMGO algorithm and that of the comparison algorithms in terms of fitness values is shown in Table 9.The BIMGO obtained the best average fitness values on 10 out of 16 datasets, namely Bone Marrow Transplant, Cervical Cancer, Glioma, ILPD, SPECT, Arcene Cancer, DLBCL, Parkinson Ⅱ, Prostate Tumors, and Leukemia.That is, BIMGO achieved the best average classification accuracy on the 10 datasets mentioned above.The proposed algorithm slightly underperformed compared to BMPA on the Breast Cancer dataset.The BIMGO achieved moderate performance on the Diabetes dataset, ranked only the fifth.While the BIMGO slightly underperformed compared to the BBA on the Heart Disease dataset, it achieved significantly better performance than the comparison algorithms.On the Parkinson Ⅰ dataset, the BIMGO algorithm did not perform as well as the BPSO and BGWO.On the Thoracic Surgery and the Colon datasets, the BIMGO ranked second to the BSSA.Overall, the BIMGO achieved an impressive average rank of 1.6 across 16 medical datasets, which is significantly better than other optimization algorithms.The proposed BIMGO exhibited significantly better convergence and fitness values than other algorithms on the Bone, Cervical Cancer, Glioma, Arcene Cancer, and Parkinson II datasets, and exhibited strong convergence with slightly better fitness values than other algorithms on the ILPD and SPECT datasets.Multiple algorithms achieved the best fitness values on DLBCL and Prostate datasets, but BIMGO had the strongest convergence ability.Although BIMGO also achieved the best fitness value on the Leukemia dataset, its convergence ability is average.BIMGO demonstrated excellent convergence ability and achieved the 2nd best fitness value on the Breast Cancer, Heart Disease, Prostate, and Colon datasets.Although BIMGO only achieved the third fitness value on the Parkinsons I dataset, the gap with BPSO and BGWO algorithms is very small and the convergence ability is excellent.The proposed algorithm exhibited average convergence ability and fitness value on the Diabetes dataset.It is shown in Fig 6 that comparing with the competing algorithms, the BIMGO algorithm gains the best overall performance on 10 data sets.
Table 10 presents a comparison of the sensitivity, expressed in percentage.The BIMGO achieved the highest sensitivity on 10 of the 16 datasets.BIMGO's performance on the Breast Cancer dataset was average, ranking 6th.On the Diabetes dataset, the BIMGO slightly underperformed compared to BWOA.BIMGO was slightly worse than BBA and BNOA on the Heart Disease dataset, and BPSO on the Parkinsons I dataset.All algorithms accurately found all positive samples on the Thoracic Surgery dataset.On the Parkinson II dataset, BIMGO ranked 5th and obtained the average sensitivity of 94.07%.BIMGO ranked 8th on the Leukemia dataset, with a large gap between the sensitivity achieved by algorithms such as BKOA.The BIMGO maintained the highest average sensitivity ranking across all datasets, highlighting its superior ability to accurately detect disease features.We also compared the number of features selected by different algorithms.Table 11 the details of the results.An intuitive comparison is shown in Fig 7.
On average, in terms of the number of features selected, the BIMGO ranked second across the sixteen medical datasets, as shown in Table 11.This algorithm selected fewer features while maintaining the best classification accuracy, demonstrating its effectiveness in FS for medical datasets.
Fig 7 shows the maximum, minimum, and average number of features selected by these 9 algorithms for feature selection on 16 medical datasets.In this paper, the average number of selected features was mainly used as the evaluation criterion.From Fig 7, BIMGO achieved the lowest average number of features on the Bone, Glioma, ILPD, Thoracic Surge, Arcene Cancer, Colon, and Parkinson II datasets.In particular, the feature approximation rate of BIMGO was much larger than other algorithms on Glioma, Colon and Parkinson Ⅱ datasets.On the Parkinsons Ⅰ, SPECT, DLBCL, and Leukemia datasets, the average number of features selected by BIMGO was not the least, but it is not far from the algorithms selecting fewer features.There was a certain gap of the average number of features selected between BIMGO and better algorithms on the Break Cancer, Diabetes, Heart Disease, and Prostate Tumors datasets.On the Cervical dataset, the average number of features selected by the BIMGO algorithm differed significantly from algorithms such as BWOA.
A 5% Wilcoxon rank-sum test was conducted to statistically determine the presence of significant differences in the algorithm performance on the medical datasets.Table 12 presents these results, employing the symbols (+, =, and −) to denote whether BIMGO's performance on a dataset is 'significantly better than, comparable to, or worse than' that of the other algorithms under comparison.In summary, the performance evaluation on medical datasets demonstrates that BIMGO significantly outperforms the other optimization algorithms in terms of the fitness value and sensitivity.A 5% Wilcoxon rank-sum test verifies that BIMGO significantly outperforms other competing algorithms.Additionally, the BIMGO tends to select fewer features while maintaining optimal performance on most medical datasets.These results indicate the effectiveness of the BIMGO in addressing feature selection challenges in medical data.
4) Parameter sensitivity analysis.In optimization problems, different parameter values may lead to different results.The BIMGO algorithm proposed in this paper to solve the feature selection problem for medical data does not have its own control parameters.Therefore, in this section, we discuss the effect of the population size and number of iterations on the BIMGO.We chose 5 datasets with different dimensions for our experiments, namely Diabetes, Glioma, Bone, Parkinson Ⅱ, and Arcene Cancer.The values of the parameters for the sensitivity analysis were set as follows: population size N = {20, 30, 50} and maximum number of iterations MaxIter = {50, 100, 200}.Each experiment was run independently 20 times.The parameter sensitivity was analyzed based on the average fitness values and convergence curves.
Table 13 shows the average fitness values obtained by the BIMGO using different parameter values on 5 datasets.Fig 8 shows the convergence curves of the BIMGO for different population sizes and numbers of iterations.
The experimental results and the convergence curves indicate that the increase in population size does not improve the performance of the algorithm in solving feature selection problems.The appropriate selection of population size will affect the performance of the algorithm.As the number of iterations increases, the BIMGO continues to converge toward the optimal solution.The BIMGO has relatively low sensitivity to these parameters, and the differences between the results are relatively small among these parameters.The advantages of the BIMGO include the following 6 points.First, the BIMGO generates solutions with better fitness values than those using competing algorithms in most cases, especially on high-dimensional datasets.Second, the BIMGO exhibits a stronger convergence ability than competitive algorithms do.The feature size is reduced by 66.39% on average on medical datasets with rich dimensionality, a result that is significantly better than that of the competitive algorithms.The BIMGO achieves the best performance in terms of recall, indicating that this algorithm has an excellent ability to select true positives, i.e., identifying disease features.This means that the BIMGO is suitable for feature selection tasks when processing medical data.Furthermore, it is clear from the statistical data analysis that the solutions generated by BIMGO are significantly better than those of other optimization algorithms in most cases.Finally, the design of the BIMGO algorithm is clear and easy to understand, and further enhancements performed on the subbasis will be easily implemented.Although the results of the evaluation experiments indicate that the proposed BIMGO algorithm performs better in dealing with the FS problem for medical data, there are still some limitations.First, the BIMGO algorithm exhibits average performance on low-dimensional datasets, and it cannot select fewer features on these datasets.Second, the computational cost of the BIMGO is high and is not applicable to applications that are sensitive to computational cost.The SVM was chosen as the learning scheme in this study.Although the SVM exhibited better accuracy, it has a high time cost and is sensitive to noisy data.Finally, in this study, only feature selection for medical datasets was performed, and the application of the BIMGO to different domains and problem types needs further discussion and research.

V. Conclusion and future works
To address the feature selection problem, we proposed an improved IMGO based on the MGO algorithm and its binary version, namely, the BIMGO.To improve the algorithm's optimization ability, the proposed algorithm uses four new strategies: applying the ICMIC to initialize the population, introducing nonlinear control factors to control the search process, using a spiral perturbation mechanism to perturb the individuals in each iteration, and designing the neighborhood search strategy for the optimal individual.The use of these strategies can improve the diversity of the population, enhance the development and convergence ability, and improve the search efficiency and the quality of the solution of the algorithm.Two sets of experiments were conducted to verify the ability of the IMGO to address continuous problems and the BIMGO to address feature selection problems.In the first set of experiments, 23 benchmark functions were used to compare the IMGO algorithm with competitive algorithms, and the experimental results showed that the IMGO algorithm effectively outperformed the competitive algorithms as IMGO achieved the best results on 18 benchmark functions.In the second set of experiments, we evaluated the performance of the BIMGO on 16 medical datasets with dimensions ranging from 10 to 10509 and compared it with that of 8 well-known metaheuristic algorithms in terms of the fitness value, number of selected features, and sensitivity.The experimental results show that BIMGO achieved the best fitness values on 10 datasets, and its overall performance is the best among all algorithms.In terms of sensitivity, BIMGO achieved the best results on 10 datasets and the overall performance.The performance of BIMGO in sensitivity indicates its applicability in processing medical data.BIMGO is able to select as few features as possible while ensuring accuracy, and its average feature approximation rate is 66.39%.The experimental results showed that the BIMGO significantly outperformed the compared algorithms in most cases.Finally, the experimental results were statistically analyzed using the Wilcoxon rank-sum test, which indicated that the BIMGO significantly outperformed the competing algorithms in handling the feature selection problem for processing medical data.However, the BIMGO still has several limitations: its performance on low-dimensional datasets is mediocre, and its computational cost is high.In the future, we will further improve the algorithm and conduct in-depth research on its performance in combination with different classifiers.Moreover, in addition to comparisons with classical algorithms, we will also pay attention to other novel metaheuristic algorithms that can be used as comparison algorithms to further evaluate the performance of the BIMGO and to improve its performance in the future.In addition, we would like to apply the algorithm to other fields, such as finance and engineering.

Fig 6
Fig 6 plots the convergence curve for each algorithm based on the fitness value.The proposed BIMGO exhibited significantly better convergence and fitness values than other algorithms on the Bone, Cervical Cancer, Glioma, Arcene Cancer, and Parkinson II datasets, and exhibited strong convergence with slightly better fitness values than other algorithms on the ILPD and SPECT datasets.Multiple algorithms achieved the best fitness values on DLBCL and Prostate datasets, but BIMGO had the strongest convergence ability.Although BIMGO also achieved the best fitness value on the Leukemia dataset, its convergence ability is average.BIMGO demonstrated excellent convergence ability and achieved the 2nd best fitness value on the Breast Cancer, Heart Disease, Prostate, and Colon datasets.Although BIMGO only achieved the third fitness value on the Parkinsons I dataset, the gap with BPSO and BGWO algorithms is very small and the convergence ability is excellent.The proposed algorithm exhibited average convergence ability and fitness value on the Diabetes dataset.It is shown in Fig 6 that comparing with the competing algorithms, the BIMGO algorithm gains the best overall performance on 10 data sets.Table10presents a comparison of the sensitivity, expressed in percentage.The BIMGO achieved the highest sensitivity on 10 of the 16 datasets.BIMGO's performance on the Breast Cancer dataset was average, ranking 6th.On the Diabetes dataset, the BIMGO slightly underperformed compared to BWOA.BIMGO was slightly worse than BBA and BNOA on the Heart Disease dataset, and BPSO on the Parkinsons I dataset.All algorithms accurately found all positive samples on the Thoracic Surgery dataset.On the Parkinson II dataset, BIMGO ranked 5th and obtained the average sensitivity of 94.07%.BIMGO ranked 8th on the Leukemia dataset, with a large gap between the sensitivity achieved by algorithms such as BKOA.The BIMGO maintained the highest average sensitivity ranking across all datasets, highlighting its superior ability to accurately detect disease features.

5 )
Discussions.This paper presents an effective solution for feature selection tasks on medical datasets of different dimensions.The algorithm continuously searches in the solution space using four different mechanisms, and this powerful exploration capability enables the BIMGO algorithm to quickly search for a feasible solution during the feature selection process, especially for high-dimensional medical datasets.The introduction of the ICMIC in the initialization phase increases the diversity of the algorithm.The introduction of nonlinear control factors can balance exploration and exploitation and improve the search efficiency of the algorithm.In addition, the quality of feasible solutions is effectively improved by the spiral perturbation mechanism and the optimal individual neighborhood search strategy to locally search the current individual and the optimal individual in the iterative process, respectively, which is confirmed by the excellent performance on the evaluation criterion of the recall rate.The results of the evaluation comparison experiments indicate that the proposed BIMGO algorithm performs well in feature selection for medical data.

5 :
While (current iteration t<T ) do 6: For each gazelle i from 1 to N do 7: Calculate nonlinear control factor a using Eq (12); %Introducing a Sort all individuals of the population in ascending order; 21: Update best Gazelle ; 22: % The search in optimal individual's neighborhood 23: Generate a new individual Y randomly in the ½best Gazelle À absðbest Gazelle À sbest Gazelle Þ; best Gazelle þ absðbest Gazelle À sbest Gazelle Þ�; 24: If the fitness value of Y < the fitness value of best Gazelle then % Selecting of the current best individual and the better individual in